Mining and Re-ranking for Answering Biographical Queries on the Web

نویسندگان

  • Donghui Feng
  • Deepak Ravichandran
  • Eduard H. Hovy
چکیده

The rapid growth of the Web has made itself a huge and valuable knowledge base. Among them, biographical information is of great interest to society. However, there has not been an efficient and complete approach to automated biography creation by querying the web. This paper describes an automatic web-based question answering system for biographical queries. Ad-hoc improvements on pattern learning approaches are proposed for mining biographical knowledge. Using bootstrapping, our approach learns surface text patterns from the web, and applies the learned patterns to extract relevant information. To reduce human labeling cost, we propose a new IDF-inspired reranking approach and compare it with pattern’s precisionbased re-ranking approach. A comparative study of the two re-ranking models is conducted. The tested system produces promising results for answering biographical queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task

Users express their information needs in terms of queries to find the relevant documents on the web. However, users’ queries are usually short, so that search engines may not have enough information to determine their exact intents. How to diversify web search results to cover users’ possible intents as wide as possible is an important research issue. In this paper, we will propose several subt...

متن کامل

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

Maximum Entropy Context Models for Ranking Biographical Answers to Open-Domain Definition Questions

In the context of question-answering systems, there are several strategies for scoring candidate answers to definition queries including centroid vectors, bi-term and context language models. These techniques use only positive examples (i.e., descriptions) when building their models. In this work, a maximum entropy based extension is proposed for context language models so as to account for reg...

متن کامل

A Frequency Mining-Based Algorithm for Re-ranking Web Search Engine Retrievals

Conventional web search engines retrieve too many documents for the majority of the submitted queries; therefore, they possess a good recall, since there are far more pages than a user can look at. Precision; however, is a critical factor in these conditions, because the most related documents should be presented at the top of the list. In this paper, we propose an online page re-rank model whi...

متن کامل

Spoken question answering using tree-structured conditional random fields and two-layer random walk

In this paper, we consider a spoken question answering (QA) task, in which the questions are in form of speech, while the knowledge source for answers are the webpages (in text) over the Internet to be accessed by an information retrieval engine, and we mainly focus on query formulation and re-ranking part. Because the recognition results for the spoken questions are less reliable, we use N-bes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006